NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The impact of heterogeneous shared leadership in scientific teams

https://doi.org/10.1016/j.ipm.2023.103542

Xu, Huimin; Liu, Meijun; Bu, Yi; Sun, Shujing; Zhang, Yi; Zhang, Chenwei; Acuna, Daniel E.; Gray, Steven; Meyer, Eric; Ding, Ying (January 2024, Information Processing & Management)

Leadership is evolving dynamically from an individual endeavor to shared efforts. This paper aims to advance our understanding of shared leadership in scientific teams. We define three kinds of leaders, junior (10–15), mid (15–20), and senior (20+) based on career age. By considering the combinations of any two leaders, we distinguish shared leadership as “heterogeneous” when leaders are in different age cohorts and “homogeneous” when leaders are in the same age cohort. Drawing on 1,845,351 CS, 254,039 Sociology, and 193,338 Business teams with two leaders in the OpenAlex dataset, we identify that heterogeneous shared leadership brings higher citation impact for teams than homogeneous shared leadership. Specifically, when junior leaders are paired with senior leaders, it significantly increases team citation ranking by 1–2 %, in comparison with two leaders of similar age. We explore the patterns between homogeneous leaders and heterogeneous leaders from team scale, expertise composition, and knowledge recency perspectives. Compared with homogeneous leaders, heterogeneous leaders are more impactful in large teams, have more diverse expertise, and trace both the newest and oldest references.
more » « less
Full Text Available
Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog

https://doi.org/10.18653/v1/2023.acl-industry.18

Cheng, Kewei; Li, Xian; Wang, Zhengyang; Zhang, Chenwei; Huang, Binxuan; Xu, Yifan Ethan; Dong, Xin Luna; Sun, Yizhou (July 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics)

Product catalogs, conceptually in the form of text-rich tables, are self-reported by individual retailers and thus inevitably contain noisy facts. Verifying such textual attributes in product catalogs is essential to improve their reliability. However, popular methods for processing free-text content, such as pre-trained language models, are not particularly effective on structured tabular data since they are typically trained on free-form natural language texts. In this paper, we present Tab-Cleaner, a model designed to handle error detection over text-rich tabular data following a pre-training / fine-tuning paradigm. We train Tab-Cleaner on a real-world Amazon Product Catalog table w.r.t millions of products and show improvements over state-of-the-art methods by 16% on PR AUC over attribute applicability classification task and by 11% on PR AUC over attribute value validation task.
more » « less
Full Text Available
Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

https://doi.org/10.18653/v1/2023.findings-acl.642

Huang, Zijie; Wang, Daheng; Huang, Binxuan; Zhang, Chenwei; Shang, Jingbo; Liang, Yan; Wang, Zhengyang; Li, Xian; Faloutsos, Christos; Sun, Yizhou; et al (July 2023, Findings of the Association for Computational Linguistics: ACL 2023)

Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.
more » « less
Full Text Available
Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

https://doi.org/10.1145/3442381.3450114

Zhang, Xinyang; Zhang, Chenwei; Dong, Xin Luna; Shang, Jingbo; Han, Jiawei (April 2021, WWW '21: The Web Conference 2021)
null (Ed.)
Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus’ heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases – a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commerce product categorization dataset with 683 categories, our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%, significantly outperforming all compared methods; our accuracy is only less than 2% away from the supervised BERT model trained on about 50K labeled documents.
more » « less
Full Text Available
Low-shot Learning in Natural Language Processing

https://doi.org/10.1109/CogMI50398.2020.00031

Xia, Congying; Zhang, Chenwei; Zhang, Jiawei; Liang, Tingting; Peng, Hao; Yu, Philip S. (October 2020, 2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI))
null (Ed.)
Full Text Available
MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation

https://doi.org/10.1145/3308558.3313436

Ma, Fenglong; Li, Yaliang; Zhang, Chenwei; Gao, Jing; Du, Nan; Fan, Wei (May 2019, The World Wide Web Conference)

Full Text Available
Analyzing knowledge entities about COVID-19 using entitymetrics

https://doi.org/10.1007/s11192-021-03933-y

Yu, Qi; Wang, Qi; Zhang, Yafei; Chen, Chongyan; Ryu, Hyeyoung; Park, Namu; Baek, Jae-Eun; Li, Keyuan; Wu, Yifei; Li, Daifeng; et al (May 2021, Scientometrics)

Full Text Available
AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

https://doi.org/10.1145/3394486.3403323

Dong, Xin Luna; He, Xiang; Kan, Andrey; Li, Xian; Liang, Yan; Ma, Jun; Xu, Yifan Ethan; Zhang, Chenwei; Zhao, Tong; Blanco Saldana, Gabriel; et al (July 2020, KDD:20 The 26th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available

Search for: All records